Searching right sizing storage, our experience and incidents
Menu Menu
Storage choice for our faculty
300 students ( under or graduate ) 20 staff studying Hardware such as FPGA, CPU A.I. such as Neural network, image recognition or learning Networking such as Internet, Wifi Software such as Operating Systems or Programming Languages
Every students have own Note PC (MacBook)
We are using OS X since 2002 (easier than Tokyo University) The parents always ask the reason why The only commercially supported Unix for customers BYOD from 2002
Our System is updated every 5 years
1997 : Sun Enterprise 3000 x 2 + NetApp 2001 : Cheap PC (MiNT PC) x 2 + Newtech RAID x 2 2005 : HP DL380 x 2 + HP DL380 RAID + Apple Xserve / Xserve RAID 2006 : 1U CoreDuo x 180 cluster 2010 : Fujitsu Blade Server with SAN / VMWare 2015 : Dell Poweredge R630 x 4 with SAN / KVM + GFS2 Sakura cloud 2020 : Dell Poweredge R740 x 4 + R740xd x 2 (32GB / 48TB) AWS educate / Sakura cloud
Requirements
No critical processing (for student study only) Maintained by a group of students with supporting staff We want to know the internals
Possible choice / Evaluation
VMWare Datarium Pure Storage Nutanix
VMWare
* very expensive *
it is just a Solaris with SAN
Very flexible, various template for students and researchers
GFS2 + KVM
open sourceLinux PC Cluster and GFS2 is very difficult handle
the lock manager is the single point of failure very easy to stop all of them migration is easy on GFS2 / SAN very slow
Pure Storage
FPGA base compression storageSAN
HCI
choice
HCI
SDS Software definable storage
Datarium / Nutanix
So called HCI. The internal is hidden in their implementations.We cannot login into the hypervisor of Nutanix system.
Our requirements
Educational purpose
We want to see the internals Open sources Maintenance by a group of students
Current System
No iSCSI network
HDD node x 2 + SSD node x 4 with GPU
Ceph
Sakura Rental server
Ceph
Distributed Object Storage
OSD Object storage MDS Meta Data store MON Monitor / lock / queue cephadm container based package / administration toolAll Ceph commands have to be executed in a cephadm container
Ceph requirements
No multiple OSD on a HDD/SSDEach OSD require 4GB buffer memory (what?!)
At lease 3 MONs are required
We should not run MDS/MON on OSD node
So...
Ceph requires rather expensive resources
Netapp trouble at 1998
Bad Configuration setting stop everything.
International telephone call in the midnight.
It has a snapshot including system configuration (very good)Maker Supported Storage is very good.
RAID trouble
RAID technology is not a backupRebuild procedure
Bad things my happen during rebuild (as always)
GFS2 trouble
Linux cluster with Colosync
DLM is the single point of failureVery easy to stop
Recovery
GFS2 in on LVM (possibly software raided)read Volume Group configuration in the system area. (QNAP requires the same kind of trick)
remove cluster flag in VG header
status = ["RESIZEABLE", "READ", "WRITE", "CLUSTERED"]remove SCSI reservation
sg_persist --out --no-inquiry --clear --param-rk=0xb8c90000 --device=/dev/sdbmount without lock mount -o lockproto=lock_nolock /dev/mapper/vg_whisky-lv_whisky /mnt/whisky/
Ceph trouble
Ceph contains everything in OSD, but ...Without MDS (meta data), no-way to access the contents.
If we change the IP address, linkage MDS / OSD is lost, that is, we lost everything.
So Basically,
we cannot change the ip address of OSD/MDS/MON
Recovery
There are possible ways to recover data from OSD, but not so easyBut we have
NetGear the cheap RAID disks with pair, which help us the recovery.
So which is better, GFS2 or Ceph
GFS2 rely on RAID / SAN technologydirect write thru iSCSICeph is based on Erasure Coding
large memory bufferCeph cannot handle yet...
heterogeneous configuration (NVMe/SSD/HDD) dedup
Thank you!